Non-Markovian Control with Gated End-to-End Memory Policy Networks
نویسندگان
چکیده
Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning.
منابع مشابه
A Non-Preemptive Two-Class M/M/1 System with Prioritized Real-Time Jobs under Earliest-Deadline-First Policy
This paper introduces an analytical method for approximating the performance of a two-class priority M/M/1 system. The system is fully non-preemptive. More specifically, the prioritized class-1 jobs are real-time and served with the non-preemptive earliest-deadline-first (EDF) policy, but despite their priority cannot preempt any non real-time class-2 job. The waiting class-2 jobs can only be s...
متن کاملA Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study
This paper introduces an analytical method for approximating the performability of a firm realtime system modeled by a multi-server queue. The service discipline in the queue is earliestdeadline- first (EDF), which is an optimal scheduling algorithm. Real-time jobs with exponentially distributed relative deadlines arrive according to a Poisson process. All jobs have deadlines until the end of s...
متن کاملSynchronization criteria for T-S fuzzy singular complex dynamical networks with Markovian jumping parameters and mixed time-varying delays using pinning control
In this paper, we are discuss about the issue of synchronization for singular complex dynamical networks with Markovian jumping parameters and additive time-varying delays through pinning control by Takagi-Sugeno (T-S) fuzzy theory.The complex dynamical systems consist of m nodes and the systems switch from one mode to another, a Markovian chain with glorious transition probabili...
متن کاملDipyridamole stress and rest gated 99mTc-sestamibi myocardial perfusion SPECT: left ventricular function indices and myocardial perfusion findings
Introduction: We investigated the difference in left ventricular ejection fraction (LVEF) and end-systolic volume(ESV) measured by gated myocardial perfusion SPECT (GSPECT) in the post-dipyridamole stress and rest periods, and compared the results with the perfusion patterns found in the conventional non-gated tomograms. Methods: 297 consecutive patients were studie...
متن کاملPerformance Modeling of Blocking Probability in Multihop Wireless Networks
Ad-hoc network communication requires efficient routing protocols to overcome the problems associated with unpredictable and dynamically changing topologies, which are mostly triggered by nodes mobility and non-existence of base stations and central controllers. We propose a performance evaluation model for blocking probability of multihop calls in ideal macrocell environment conditions using t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.10993 شماره
صفحات -
تاریخ انتشار 2017